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Abstract —In this paper, we investigate the design of multiple- 
input multiple-output single-user precoders for finite-alphahet 
signals under the premise of statistical channel-state information 
at the transmitter. Based on an asymptotic expression for the 
mutual information of channels exhibiting antenna correlations, 
we propose a low-complexity iterative algorithm that radically 
reduces the computational load of existing approaches by orders 
of magnitude with only minimal losses in performance. The com¬ 
plexity savings increase with the number of transmit antennas 
and with the cardinality of the signal alphabet, making it possible 
to support values thereof that were unwieldy in existing solutions. 

I. Introduction 

Although Gaussian signals are capacity-achieving in a 
multiple-input multiple-output (MIMO) channel under perfect 
channel-state information (CSI) at the receiver, signals con¬ 
forming to discrete constellations are transmitted in practice, 
and the design of precoders optimized for such signal formats 
is a topic that has gathered momentum in recent years [1-10]. 

The works in [3-10] consider the problem under the as¬ 
sumption of perfect CSI at the transmitter, which is a rea¬ 
sonable premise in reciprocal or slow fading channels. Often 
though, perfect CSI at the transmitter is an impossibility and 
only statistical CSI is available therein; these are the conditions 
on which we concentrate here. For Gaussian signals, the design 
of MIMO precoders with statistical CSI has been addressed in 
[11-18]. For discrete signals, an iterative precoding algorithm 
was proposed in [19] and shown to achieve a high ergodic 
spectral efficiency in simulations. However, the complexity of 
this complete-search algorithm is exponential in the number 
of transmit antennas and, even with modest numbers thereof 
(say, eight), it becomes unwieldy. 

The alternative algorithm proposed in this paper drastically 
reduces the search space, and with it the complexity, but in 
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such a way that the loss in performance—established based 
on the 3GPP spatial channel model (SCM) [20] —is minimal. 

The remainder of this paper is organized as follows. Section 
II describes the system model. In Section III, we review the 
complete-search algorithm and propose an idea to reduce its 
computational complexity. In Section IV, we propose a low 
complexity precoder design. Numerical results are provided 
in Section V, and our main results are summarized in Section 
VI. 

The following notations are adopted throughout the paper: 
diag {A} denotes a diagonal matrix containing the diagonal of 
matrix A, vec (A) is a column vector containing the stacked 
columns of matrix A, [Aj^n denotes the (m,n)th entry of 
matrix A, [a]m denotes the mth entry of vector a, 1m denotes 
an M X M identity matrix, tr(-) denotes the trace operation, 
det(-) denotes the matrix determinant, and Ey [■] represents 
the expectation with respect to random variable V, which can 
be a scalar, vector, or matrix. 

II. Signal Model 

Consider a single-user MIMO channel where transmitter 
and receiver are equipped with Nt and antennas, respec¬ 
tively. The received signal y G can be written as 

y = Hx + n (1) 

where H G c^rxNt ^ random channel matrix whose (*, j)th 
entry denotes the complex fading coefficient between the jth 
transmit and the dh receive antenna, x G denotes zero- 
mean transmitted vector with covariance Sx, and n G is 
a zero-mean complex Gaussian noise vector with covariance 
In^. The transmit vector x satisfies the power constraint 

tr (Ex) < P. (2) 

Based on the statistical CSI, and subject to the power con¬ 
straint, the transmitter needs to optimize Ex to maximize the 
ergodic spectral efficiency. 

With H known at the receiver, the ergodic mutual informa¬ 
tion between x and y is given by [21] 


E 

E 

r, p(y|x.H) 

H 



[_ ^ p(y|H) 



In (3), p(y|x, H) denotes the probability density function 
(p.d.f.) of y conditioned on (x, H), and p(y|H) denotes the 
p.d.f of y conditioned on H. 








III. Complete-Search Precoder Design 

In this section, we review the complete-search approach 
for optimization of Sx and introduce an idea to reduce its 
computational load in the case where instantaneous CSI is 
available at the transmitter. Then, we will extend this idea 
to the case where only statistical CSI is available at the 
transmitter in next section. 

Let X = B d, where d € C^‘ ^ ^ is the signal vector drawn 
from an equiprobable constellation of size whereas B € 
C^txiVt js j-jjg precoder. Let d^ denote the mth element in 
the constellation. Consider the singular value decomposition 
(SVD) B = UbAbVb where Ab € is diagonal 

while Ub € and Vb € are unitary. 

When Gaussian-signal precoding solutions are applied to 
discrete constellations, the performance suffers because, in the 
face of major power variations between MIMO subchannels, 
these solutions insist on beamforming over an extensive range 
of signal-to-noise ratios (SNRs), well beyond the point where 
beamforming is appropriate for a discrete constellation. With 
beamforming, signalling is only possible over the dominant 
subchannel, which causes a performance loss with discrete 
signals (cf. [4,19]). By properly designing Ub, Ab, and Vb, 
the complete-search precoder design minimizes this loss [4, 
19]. Thereby, the matrix Vb mixes the Nt original signals 
into Nt beams, then Ab allocates power to those beams, and 
hnally Ub aligns them spatially as they are launched onto the 
channel. With a proper choice of Vb, all the Nt signals can 
be effectively transmitted even if only a single beam is active. 

The following example illustrates the role of Ub, Ab, and 
Vb. 

Example 1: Consider a 4 x 4 deterministic channel A with 
SVD A = UaAaVa, which is perfectly known at the 
transmitter. The received signal is given by 


y = AUBABVBd + n (4) 


where d = [di, (i 2 , da, ^ 4 ]^. From [4, Prop. 2], the optimal 
design satisfies Ub = V^. Then, based on [4, Eq. ( 8 )], (4) 
can be rewritten as 


ttiAi 


y = 


04 A4 


Vil . . . Vi4 

V 41 ■ • ■ V 44 


d + n 


( 5 ) 


where y = U^y while at and Ai are the diagonal entries of 
A A and Ab, respectively, and Vij = [V]^^. 

Assume two of the subchannel gains, say 02 and 04 , are 
very weak. Then, with a Gaussian-signal precoder, the powers 
allocated to the corresponding subchannels will be very small 
even at moderate SNRs. Since, with Gaussian signals, Vb 
is immaterial, d 2 and then cannot be transmitted. With a 
proper Vb, in contrast, the received signal equals 
4 


[y]. = QiXi'^V.jdj z = l,2,3,4 (6) 


i=i 


and now, even if 02 A 2 ~ 0 and 04 A 4 ~ 0 , d 2 and di can still 
be effectively transmitted along other subchannels. 


As indicated by ( 6 ), an adequate design for discrete con¬ 
stellations in general mixes all the signals {di,d 2 ,d 3 ,d 4 ) and 
transmits the ensuing beams on different subchannels. As a 
result, the search space for computing the mutual information 
with finite alphabet inputs grows exponentially with Nt [4]. 

Intuitively though, if there are only two weak subchannels, 
say 02 and 04 in Example 1, it is not necessary to mix all 
the signals. It suffices to mix ^2 with di and ^4 with da and 
transmit the ensuing beams on the stronger subchannels oi 
and 03. This corresponds to 
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which, plugged into (5), gives 

2 

[y]i = OiAi ^ Vijdj 1 = 1,2 (8) 

i=i 
4 

[y]i = OiAi ^ Vijdj * = 3,4. (9) 

i=3 

Observe from ( 8 ) and (9) that {di,d 2 ) and (^ 3 ,^ 4 ) are 
decoupled. If d is drawn from quadrature phase shift keying 
(QPSK) distributions, then the search space for computing the 
mutual information with finite alphabet inputs in ( 8 ) and (9) is 
of dimension 2x4^^^ = 512 [4]. In contrast, for the complete 
search in ( 6 ), it is of dimension 4^^'^ = 65536. Since d 2 and d^ 
are transmitted all the same, the structure in (7) may perform 
close to the complete-search design, but with a substantially 
lower computational complexity. This observation is the basis 
for the low-complexity precoder design proposed in the next 
section. 

IV. Low-Complexity Precoder Design 

In this section, we extend the idea above to the case where 
only statistical CSI is available at the transmitter. Eirst, we 
introduce the channel model. Then, we provide an asymptotic 
(large system limit) expression of the erogdic mutual infor¬ 
mation in (3). Based on this asymptotic expression, we study 
precoder structures, based on a low complexity numerical 
algorithm is proposed to design the precoder. 

A. Channel Model 

Inspired by (8) and (9), we propose a low-complexity design 
to maximize the ergodic spectral efficiency in (3). Thereby, we 
consider the popular Kronecker channel model [22] 

H = A^/^WAt^^ (10) 

where Ar C ^NrxNr ^ ^Ntxm transmit 

and receive correlation matrices while W C ([j^NrxNt ^ 

random matrix whose entries are independent and identically 
distributed (IID) complex Gaussians. The eigenvalue decom¬ 
positions of Ar and Ax are 

AR = URARUf (11) 

At = UxAxUf (12) 








where Ur € and Ut G are unitary matrices, 

and Ar G ^NrxNr g ^NtxNt ^j.g diagonal matrices. 

For this channel model, the optimal left singular matrix Ub 
of precoder B equals Ut [19]. From this, using [19, Eq. (5)], 
and recalling (1) and (10), we can rewrite (1) as 

yeq HeqXgq “f n (13) 

where 

yeq = Uf y (14) 

Xcq = ABVBd (15) 

Heq = (16) 

and where n and W have the same distributions as n in (1) 
and W in (10), respectively. 

B. Mutual Information in the Large-Dimensional Regime 
In order to obtain counterparts of (8) and (9) for this setting, 
we move into the large-dimensional regime [23]. When both 
Ar and Nt grow large with ratio c = At/Ar, the mutual 
information in (3) satisfies [23] 

/(x;y) ~/asy(x;y) (17) 

where 


/asy(x; y) = I (Xeq; Zgq) -f log2 det (iTVr -f Req) 

- TeqV'eq l 0 g 2 e. (18) 

given the diagonal MIMO relationship 

Zeq = Hg/^Xeq -I-n (19) 

where Xgq is given in (15) and n G C^* is a standard complex 
Gaussian random vector. The diagonal matrix Hgq is a function 
of auxiliary variables {yeq, V’eq, Req}, which are the solutions 


of the following set of coupled equations 

“^eq — yeq.^T (20) 

Req — V^eq.^R ( 21 ) 

7eq = tr + Req)“^ Ar^ (22) 

Ipeq = tr (UeqAx) . (23) 


Computing Hgq requires finding {yeq, ipeq, Req} through fixed 
point equations (20)-(23). The diagonal relationship in (19) 
does not relate to any physical channel, but is merely an 
instrument to obtain an asymptotic expression for the mutual 
information. Nevertheless, we shall take advantage of this 
relationship. 

Also necessary for later derivations is the minimum mean 
square error (MMSE) estimate of Xgq based on (19), which is 
given by 

l^eq — E [XgqjZgq] . (24) 


It will be convenient to define the following MMSE matrix 
as the covariance of the error vector between the transmitted 
signal and its estimate. 


^eq — E [(Xgq Xgq)(Xeq Xgq) j . 


C. Precoder Structure 

Let us divide the transmit signal d into S streams. Let the 
set {£i,... ,^jVj} denote a permutation of { 1 , • • • , At} and let 
As G and Vg G denote a diagonal matrix 

and a unitary matrix, respectively, for s = 1,..., S'. Ag and 
Vg will be optimized later. The goal of arranging these S 
streams as in ( 8 ) and (9) prompts the following design steps: 

1) Structure of Ab." We define 

[Ab],,,, = [Ag],, (26) 

where i = 1,..., Ag, s = 1,..., S, and j = {s — l)As -|- i. 
Under this structure, the sth stream is transmitted along the 
^(g_i)Ar^+i ,... ,£(^s-i)n,-\-n, diagonal entries of Hgq. 

2) Structure o/V b." We define 

[Vb],,,^. = (27) 

/ if i = (s-l)As-bm, j = (s-l)Ag + n 

{ 0 otherwise 

where m = l,...,As, n = l,...,As, s = 1,...,S, 
i = 1,..., At, and j = 1,..., At. Under this 

structure, for the sth stream the entries of Vg map 
only to rows i(^s-i)N,+i, ■ ■ ■ J{s-i)n,+n, and columns 
^(g_i) 7 V 3 +i, ■ • ■ J(s-i)N,+N, of Vb. This yields S decoupled 
groups of streams at the receiver. 

The design in (7) is a specific instance of (27) with 
{£i, ■ ■ ■ ,£Nt} = {1, 2, 3,4} and S = 2. Recall how (di, ^ 2 ) 
and (da, (( 4 ) are indeed decoupled in ( 8 ) and (9). 

3) Structure o/dg." Einally, we let 

[d«], = [d],^ . (28) 

It is noted that for the precoder design with perfect instan¬ 
taneous CSI, similar decoupled structures as in (26)-(28) are 
presented in [9] based on a per-group precoding technique. 

D. Precoder Optimization 

Based on (26)-(28), the relationship in (15) can be rewritten 
as 


[xgq],. = [AgVgdg]^ (29) 

for i = 1,..., Ag, s = 1,... ,S, and j = (s — 1) Ag -|- i. 

Recalling that Hgq is diagonal, (19) then reduces to 

[^eq]^^. = [“eq]^^.£^ [^eq]^^ + ■ (30) 

Eqs. (29) and (30) indicate that each independent data 
stream dg is transmitted along its own Ag separate subchan¬ 
nels without interfering with other streams. Eurthermore, the 
MMSE matrix in (25) then equals 


[O. 




(31) 


if i = (s - l)Ag -Pm, j = is- 1) Ag -f : 


0 


where 


otherwise 


(32) 


(dg -dg)(dg - dg) 


H 


s s 


(25) 


(33) 




Algorithm 1: Maximization of / (x;y) with respect to B. 


Initialize for s = 1,..., S'. Fix a maximum 

number of iterations, Aiter, and a threshold e. 


1 ) 

2) Initialize Heq, Req, 7eq, and ■^eq based on (20)-(23), 
with flgq based on (31). Then, initialize (x; y) based 
on (17) with I (Xgq; Zgq) as per (35). Set counter n = 1. 

3) Update for s = 1,..., S along the gradient descent 


= P. 


direction given by (36). 

4) Normalize X]f=i tr ^ 

5) Update for s = 1,..., S along the gradient descent 
direction in (37). 

6) Update Hgq, Req, 7eq, and -ipsq based on (20)-(23), (31). 

7) Compute /("+i)(x;y) based on (17) and (35). If 

/("+!) (x; y) — (x; y) > e and n < Alter, set 

n = n + 1 and repeat Steps 3-7; 

8) Compute Ab and Vb based on (26) and (27). Set B = 

y i-> y ^ 


with 


d, = A 


(34) 


and [z,]. = [zeq]^^. 

The main term in the mutual information in (17) is 
I (xeq; Zeq), wWch Can now be expressed as 


s 

^ (^eqi ^eq) — ^ ^ 7 (dg, Z^) (35) 

s=l 


based on which the gradients of I^sy (x; y) with respect to A^ 
and Vs are given by [24, Eq. (22)], 

VAj/asy (x; y) = diag (Vf EsVsHs) (36) 

Vv./asy (x; y) = HsAfVsEs (37) 


where 


Es = E 




(38) 


and we define diagonal matrices Hg, for s = 1,..., S, with 
entries = [Heq]^^^.. 

From (35), and from the relationship between Ai,..., As 
and Ab in (26) as well as the relationship between 
Vi,...,Vs and Vb in (27), we propose Algorithm 1 to 
optimize Ab and Vb- In Steps 3 and 5 of Algorithm 1, Ag"^ 
and vi"^ are updated along the gradient descent direction, 
with the backtracking line search method used to determine 

(n) 

the step size. In Step 4, Ag is normalized to satisfy the 
power constraint. In Step 6, Hgq, Req, 7eq, and ipeq are 
updated for the new precoder based on (20)-(23), (31). In 
Step 7, if n is less than the maximal number of iterations 
and /("+!) (x; y) — (x; y) is larger than a threshold, we 

implement the next iteration, otherwise, we compute the final 
precoder and stop the algorithm. 

Remark 1: For the complete-search design algorithm [4,19], 
the complexity is dominated by the computation of the mutual 


TABLE I: Run time (sec.) per iteration with BPSK. 


Nt 

A = 2 

II 

II 

4 

0.0051 

0.0190 

0.0190 

8 

0.0112 

0.0473 

11.6209 

16 

0.0210 

0.1939 

X 

32 

0.0570 

0.4111 

X 


TABLE II: Run time (sec.) per iteration wit h QPSK. 


Nt 

Ns = 2 

II 

II 

4 

0.1149 

21.5350 

21.5350 

8 

0.2029 

23.3442 

X 

16 

0.3001 

48.1725 

X 

32 

0.7094 

98.7853 

X 


information and the MMSE matrix, which grows exponentially 
with 2At. For Algorithm 1, alternatively, the complexity of 
computing the mutual information and the MMSE matrix in 
Algorithm 1 grows exponentially with 2As. Thus, by choosing 
proper values of S and Ag, Algorithm 1 offers a tradeoff 
between performance and complexity. At one end, when 5 = 1 
and Ag = At, Algorithm 1 searches the entire space, while at 
the other end, when 5 = At and Ag = 1, Algorithm 1 merely 
allocates power among the At parallel subchannels. Varying 
Ag from 1 to At bridges the gap between separate and fully 
joint transmission of the At original signals. 

Remark 2: An adequate choice of ..., is important 
for Algorithm 1 to perform satisfactorily. As discussed in 
Section III, the most important step to compensate the per¬ 
formance loss caused by the Gaussian input design is to pair 
a strong subchannel with a weak subchannel and transmit the 
mixed signals along them together. Therefore, the Ag/2 largest 
diagonal entries of [Heq] are paired with the As/2 smallest 
diagonal entries. Then, the remaining As/2 largest diagonal 
entries of [Hgq] are paired with the remaining As/2 smallest 
ones, and so on. This generalizes the two-antenna scheme in 
[5]. 

Remark 3: Since aI"^ and vi"^ are searched along the 
gradient descent direction, in Step 7 the mutual information 
/(") [x; y) is nondecreasing. Since Algorithm 1 generates 
sequences that are nondecreasing and upper-bounded, it is 
convergent. However, due to the nonconvexity of (x; y) 
in aI"^ and vi"\ Algorithm 1 may only find local optima. 
As a result, the algorithm is run several times with different 
random initializations of aI"^ and vi"^ and the final precoder 
that provides the highest mutual information is retained. 

V. Performance Evaluation 

First, let us evaluate the complexity of Algorithm 1 for 
different values of Ag. Matlab is used on an Intel Core i7- 
4510U 2.6 GHz processor. Tables I-III provide the run time 
per iteration, for various numbers of antennas and constella¬ 
tions, with X indicating that the time exceeds one hour. As 
predicted, for Ag = At, the computational complexity grows 
exponentially with At and quickly becomes unwieldy. 

Figure 1 depicts the spectral efficiency for the 3GPP SCM 
(urban scenario, half-wavelengh antenna spacing, velocity 36 
km/h) for different precoder designs with At = Aj- = 4 and 




























TABLE III; R un time (sec.) per iteration with 16-QAM. 


Nt 

Ng = 2 

II 

4 

28.0744 

X 

8 

58.3433 

X 

16 

106.6022 

X 

32 

233.2293 

X 



Fig. 1: Spectral efficiency versus SNR for the 3GPP SCM 
for different precoder designs with = 4 and QPSK. 


QPSK. A Gauss-Seidel algorithm using stochastic program¬ 
ming is employed to obtain the capacity-achieving precoder 
[14]. For Algorithm 1, both Ns = 4 and Ng = 2 are 
considered, and despite their enormous computational gap 
(cf. Table II) the difference in performance is minor. Both 
precoders hug the capacity up to the point where the QPSK 
cardinality becomes insufficient. The precoder designed via 
Algorithm 1 gains many dBs over an unprecoded transmitter 
and also over a capacity-achieving precoder applied with 
QPSK. 

Figure 2 contrasts the spectral efficiency given by the 
asymptotic expression in (17) with the exact form in (3) for the 
precoders obtained by Algorithm 1 with Ng = 2. The channel 
model is the same as for Fig. 1. The perfect match between 
the two curves confirms that (17) is a very good proxy for (3), 
and hence that Algorithm 1 is indeed effective even for small 
numbers of antennas. 

Figures 3 and 4 present further results for the same SCM 
parameter settings as in Figure 1 and 2 for Nt = N,- = 32 and 
for QPSK and 16-quadrature amplitude modulation (QAM), 
respectively. We set TVs = 4 for the former and Ng = 2 
for the latter. We note that precoder design for such large 
arrays were, to best of the authors’ knowledge, not available 
henceforth for discrete signals (except for [10], which was 
available online after the submission of this paper). As the 
numbers of antennas grow, the conditioning of the transmit 
correlation matrix At becomes progressively poorer [25] and 
the performance of the capacity-achieving precoder applied to 



Fig. 2: Asymptotic and exact spectral efficiency versus SNR 
for the 3GPP SCM (urban scenario, half-wavelength antenna 
spacing, 36 km/h) with TVt = TVr = 4 and QPSK. 



Fig. 3: Spectral efficiency versus SNR for the 3GPP SCM 
for different precoder designs with Nt = = 32, Ng = 4, 

and QPSK. 

discrete signals degrades, even failing to achieve the saturation 
spectral efficiency of Nt log 2 M b/s/Hz at relevant SNRs; 
some subchannels are simply never activated by a precoder 
intended for Gaussian signals. Algorithm 1, in contrast, is 
tailored to finite-cardinality constellations. 

VI. Conclusion 

With a proper design of Vb (right unitary matrix in the 
SVD decomposition of the precoder), it is possible to achieve a 
satisfactory tradeoff between the need to feed into the channel 
mixings of multiple finite-cardinality signals and the compu¬ 
tational complexity of exploring all possible such mixings. 











































































Fig. 4: Spectral efficiency versus SNR for the 3GPP SCM 
for different precoder designs with Nt = = 32, Ng = 2, 

and 16-QAM. 


Building on this idea, an algorithm has been proposed that— 
under the 3GPP SCM channel model—exhibits very good 
performance with orders-of-magnitude less complexity than 
complete-search solutions. More refined versions of this algo¬ 
rithm, equipped with alternative subchannel pairing schemes, 
may perform even better. Additional extensions include the 
applicability to settings with imperfect CSI, or to multiuser 
contexts, as well as the performance under other channel 
models. 
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